Skip to content

[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 #149461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 12, 2025

Conversation

badumbatish
Copy link
Contributor

@badumbatish badumbatish commented Jul 18, 2025

Fixes #149230

Previously, even with simd enabled via -mattr=+simd128, the compiler cannot utilize v128 to optimize loads and setcc of i128, instead legalizing it to consecutive i64s.

This PR then adds support for setcc of i128 by converting them to v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16 bytes or more (when simd128 is present).

The check for enabling this optimization is if the comparison operand is either a load or an integer in i128, with the comparison code being either EQ | NE, without NoImplicitFloat function flag.

Inspiration taken from RISCV's isel lowering.

@badumbatish
Copy link
Contributor Author

badumbatish commented Jul 18, 2025

Edit: this is resolved.

I'm trying out this PR but I think I encountered a blocker. The issue pops up with this reduced test case from the test case stest_f64i64 in WebAssembly/fpclamptosat.ll, produced from llvm-reduce.

I'm not sure how to reconcile this?

...

@badumbatish badumbatish requested review from lukel97 and dschuff July 18, 2025 06:24
@badumbatish
Copy link
Contributor Author

alright, with Luke's pointer from this PR #114517, I've tried a different approach: doesn't allow i128 to be legal everywhere but only on load via enableMemCmpExpansion and instead of modifying load i128 directly, I hook to setcc instead.

@badumbatish badumbatish marked this pull request as ready for review July 31, 2025 23:27
@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-backend-webassembly

Author: Jasmine Tang (badumbatish)

Changes

Fixes #149230


Full diff: https://github.com/llvm/llvm-project/pull/149461.diff

3 Files Affected:

  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp (+58-2)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp (+2-1)
  • (modified) llvm/test/CodeGen/WebAssembly/memcmp-expand.ll (+8-14)
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index cd434f7a331e4..ee16f7bf9133d 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -3383,8 +3383,61 @@ static SDValue TryMatchTrue(SDNode *N, EVT VecVT, SelectionDAG &DAG) {
   return DAG.getZExtOrTrunc(Ret, DL, N->getValueType(0));
 }
 
+static SDValue
+combineVectorSizedSetCCEquality(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
+                                const WebAssemblySubtarget *Subtarget) {
+
+  SDLoc DL(N);
+  SDValue X = N->getOperand(0);
+  SDValue Y = N->getOperand(1);
+  EVT VT = N->getValueType(0);
+  EVT OpVT = X.getValueType();
+
+  ISD::CondCode CC = cast<CondCodeSDNode>(N->getOperand(2))->get();
+  SelectionDAG &DAG = DCI.DAG;
+  // We're looking for an oversized integer equality comparison.
+  if (!OpVT.isScalarInteger() || !OpVT.isByteSized() || OpVT != MVT::i128 ||
+      !Subtarget->hasSIMD128())
+    return SDValue();
+
+  // Don't perform this combine if constructing the vector will be expensive.
+  auto IsVectorBitCastCheap = [](SDValue X) {
+    X = peekThroughBitcasts(X);
+    return isa<ConstantSDNode>(X) || X.getOpcode() == ISD::LOAD;
+  };
+
+  if (!IsVectorBitCastCheap(X) || !IsVectorBitCastCheap(Y))
+    return SDValue();
+
+  // TODO: Not sure what's the purpose of this? I'm keeping here since RISCV has
+  // it
+  if (DCI.DAG.getMachineFunction().getFunction().hasFnAttribute(
+          Attribute::NoImplicitFloat))
+    return SDValue();
+
+  unsigned OpSize = OpVT.getSizeInBits();
+  unsigned VecSize = OpSize / 8;
+
+  EVT VecVT = EVT::getVectorVT(*DCI.DAG.getContext(), MVT::i8, VecSize);
+  EVT CmpVT = EVT::getVectorVT(*DCI.DAG.getContext(), MVT::i8, VecSize);
+
+  SDValue VecX = DAG.getBitcast(VecVT, X);
+  SDValue VecY = DAG.getBitcast(VecVT, Y);
+
+  SDValue Cmp = DAG.getSetCC(DL, CmpVT, VecX, VecY, CC);
+
+  SDValue AllTrue = DAG.getZExtOrTrunc(
+      DAG.getNode(
+          ISD::INTRINSIC_WO_CHAIN, DL, MVT::i32,
+          {DAG.getConstant(Intrinsic::wasm_alltrue, DL, MVT::i32), Cmp}),
+      DL, MVT::i1);
+
+  return DAG.getSetCC(DL, VT, AllTrue, DAG.getConstant(0, DL, MVT::i1), CC);
+}
+
 static SDValue performSETCCCombine(SDNode *N,
-                                   TargetLowering::DAGCombinerInfo &DCI) {
+                                   TargetLowering::DAGCombinerInfo &DCI,
+                                   const WebAssemblySubtarget *Subtarget) {
   if (!DCI.isBeforeLegalize())
     return SDValue();
 
@@ -3392,6 +3445,9 @@ static SDValue performSETCCCombine(SDNode *N,
   if (!VT.isScalarInteger())
     return SDValue();
 
+  if (SDValue V = combineVectorSizedSetCCEquality(N, DCI, Subtarget))
+    return V;
+
   SDValue LHS = N->getOperand(0);
   if (LHS->getOpcode() != ISD::BITCAST)
     return SDValue();
@@ -3532,7 +3588,7 @@ WebAssemblyTargetLowering::PerformDAGCombine(SDNode *N,
   case ISD::BITCAST:
     return performBitcastCombine(N, DCI);
   case ISD::SETCC:
-    return performSETCCCombine(N, DCI);
+    return performSETCCCombine(N, DCI, Subtarget);
   case ISD::VECTOR_SHUFFLE:
     return performVECTOR_SHUFFLECombine(N, DCI);
   case ISD::SIGN_EXTEND:
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
index 52e706514226b..08fb7586d215e 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp
@@ -147,7 +147,8 @@ WebAssemblyTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
 
   Options.AllowOverlappingLoads = true;
 
-  // TODO: Teach WebAssembly backend about load v128.
+  if (ST->hasSIMD128())
+    Options.LoadSizes.push_back(16);
 
   Options.LoadSizes.append({8, 4, 2, 1});
   Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
diff --git a/llvm/test/CodeGen/WebAssembly/memcmp-expand.ll b/llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
index 8030438645f82..c6df6b50693fa 100644
--- a/llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
+++ b/llvm/test/CodeGen/WebAssembly/memcmp-expand.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc < %s  -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers | FileCheck %s
+; RUN: llc < %s  -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 | FileCheck %s
 
 target triple = "wasm32-unknown-unknown"
 
@@ -132,19 +132,13 @@ define i1 @memcmp_expand_16(ptr %a, ptr %b) {
 ; CHECK-LABEL: memcmp_expand_16:
 ; CHECK:         .functype memcmp_expand_16 (i32, i32) -> (i32)
 ; CHECK-NEXT:  # %bb.0:
-; CHECK-NEXT:    i64.load $push7=, 0($0):p2align=0
-; CHECK-NEXT:    i64.load $push6=, 0($1):p2align=0
-; CHECK-NEXT:    i64.xor $push8=, $pop7, $pop6
-; CHECK-NEXT:    i32.const $push0=, 8
-; CHECK-NEXT:    i32.add $push3=, $0, $pop0
-; CHECK-NEXT:    i64.load $push4=, 0($pop3):p2align=0
-; CHECK-NEXT:    i32.const $push11=, 8
-; CHECK-NEXT:    i32.add $push1=, $1, $pop11
-; CHECK-NEXT:    i64.load $push2=, 0($pop1):p2align=0
-; CHECK-NEXT:    i64.xor $push5=, $pop4, $pop2
-; CHECK-NEXT:    i64.or $push9=, $pop8, $pop5
-; CHECK-NEXT:    i64.eqz $push10=, $pop9
-; CHECK-NEXT:    return $pop10
+; CHECK-NEXT:    v128.load $push1=, 0($0):p2align=0
+; CHECK-NEXT:    v128.load $push0=, 0($1):p2align=0
+; CHECK-NEXT:    i8x16.eq $push2=, $pop1, $pop0
+; CHECK-NEXT:    i8x16.all_true $push3=, $pop2
+; CHECK-NEXT:    i32.const $push4=, 1
+; CHECK-NEXT:    i32.xor $push5=, $pop3, $pop4
+; CHECK-NEXT:    return $pop5
   %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
   %res = icmp eq i32 %cmp_16, 0
   ret i1 %res

@badumbatish badumbatish changed the title [WebAssembly] [Draft] Legalize i128 to v2i64 [WebAssembly] Legalize i128 to v2i64 for setcc Aug 1, 2025
@badumbatish badumbatish changed the title [WebAssembly] Legalize i128 to v2i64 for setcc [WebAssembly] Legalize i128 to v16i8 for setcc Aug 7, 2025
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title should probably be something like "Expand memcmp for 16 byte loads with simd128", since this PR also enables it in WebAsssemblyTargetTransformInfo.cpp

@badumbatish badumbatish changed the title [WebAssembly] Legalize i128 to v16i8 for setcc [WebAssembly] Legalize i128 to v16i8 for setcc, expand memcmp for 16 byte loads with simd128 Aug 11, 2025
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Just for the PR title, I would say Combine i128 to v16i8 for setcc since it's technically a combine, not legalization.

And make sure to flesh out the PR description with a few sentences about how ExpandMemcmp can expand larger 128 bit loads, but they're emitted as i128s and we need to combine them into v16i8 types for efficient lowering.

@badumbatish badumbatish changed the title [WebAssembly] Legalize i128 to v16i8 for setcc, expand memcmp for 16 byte loads with simd128 [WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte loads with simd128 Aug 11, 2025
@badumbatish badumbatish merged commit 348f01f into llvm:main Aug 12, 2025
9 checks passed
@dschuff
Copy link
Member

dschuff commented Aug 13, 2025

It looks like this change has caused a test failure on Emscripten's test suite: the first memcmp in the neon test.
Sorry I haven't had a chance to reduce it or investigate. If you compile that file with Emscripten (em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o something.js, or compile the preprocessed source ) you should be able to reproduce it.

@badumbatish
Copy link
Contributor Author

thanks Derek, I'll revert and investigate this

badumbatish added a commit that referenced this pull request Aug 13, 2025
… for 16 byte loads with simd128" (#153360)

Reverts #149461

The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it

Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp

Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`

Original comment report:
#149461 (comment)
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 13, 2025
…pand memcmp for 16 byte loads with simd128" (#153360)

Reverts llvm/llvm-project#149461

The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it

Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp

Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`

Original comment report:
llvm/llvm-project#149461 (comment)
@badumbatish
Copy link
Contributor Author

badumbatish commented Aug 13, 2025

i tried adding some simple print debugging to this, and found sth a bit weird (or interesting): if i print the memcmp result before the assertion, then i dont get the assertion error.

For example, modify the test loop in the first test to be this and it doesn't crash anymore

        for (size_t i = 0 ; i < (sizeof(test_vec) / sizeof(test_vec[0])) ; i++) {
                int32x4_t a = vld1q_s32(test_vec[i].a);
                int32x4_t b = vld1q_s32(test_vec[i].b);
                int32x4_t r = vaddq_s32(a, b);
                int32_t r_[4];
                vst1q_s32(r_, r);
                printf("At %zu\n", i);
                printf("Byte of r_  : %02X %02X %02X %02X\n", r_[0], r_[1], r_[2], r_[3]);
                printf("Byte of test: %02X %02X %02X %02X\n", test_vec[i].r[0], test_vec[i].r[1], test_vec[i].r[2], test_vec[i].r[3]);
                // comment or uncomment the following line
                printf("Memcmp result: %d\n\n", memcmp(r_, test_vec[i].r, sizeof(int32_t) * 4));
                assert(memcmp(r_, test_vec[i].r, sizeof(int32_t) * 4) == 0);
        }

If i comment out the memcmp result, then i get the following error

Testing NEON Wasm SIMD
At 0
Byte of r_  : 8A64C799 484D47E1 9BBF3942 B38F111F
Byte of test: 8A64C799 484D47E1 9BBF3942 B38F111F
Aborted(Assertion failed: memcmp(r_, test_vec[i].r, sizeof(int32_t) * 4) == 0, at: test/neon/test_neon_wasm_simd.cpp,58,test_simde_vaddq_s32)

Compiler exits successfully (no assertion error) in the case of printing out the memcmp result, with the following log.
Note, the commit used by build.wasm is the last commit in this PR before merging.

 em++ test/neon/test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o temp.js -v  && node temp.js
 "/Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/clang++" -target wasm32-unknown-emscripten -fignore-exceptions -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -D__SSE__=1 -D__ARM_NEON__=1 -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -O2 -msimd128 -v -c test/neon/test_neon_wasm_simd.cpp -o /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/test_neon_wasm_simd_0.o
clang version 22.0.0git ([email protected]:badumbatish/llvm-project.git 9d2b041c6d4d41d18f89f19f54c6fcef68c5e106)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin
Build config: +assertions
 (in-process)
 "/Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/clang-22" -cc1 -triple wasm32-unknown-emscripten -O2 -emit-obj -disable-free -clear-ast-before-backend -main-file-name test_neon_wasm_simd.cpp -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -target-feature +simd128 -fvisibility=hidden -debugger-tuning=gdb -fdebug-compilation-dir=/Users/jjasmine/Developer/igalia/emscripten -target-linker-version 1167.4.1 -v -fcoverage-compilation-dir=/Users/jjasmine/Developer/igalia/emscripten -resource-dir /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/lib/clang/22 -D EMSCRIPTEN -D __SSE__=1 -D __ARM_NEON__=1 -isysroot /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten/c++/v1 -internal-isystem /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1 -internal-isystem /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/lib/clang/22/include -internal-isystem /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten -internal-isystem /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include -fdeprecated-macro -ferror-limit 19 -fmessage-length=154 -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fignore-exceptions -fexceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/test_neon_wasm_simd_0.o -x c++ test/neon/test_neon_wasm_simd.cpp
clang -cc1 version 22.0.0git based upon LLVM 22.0.0git default target arm64-apple-darwin24.5.0
ignoring nonexistent directory "/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten/c++/v1"
ignoring nonexistent directory "/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten"
#include "..." search starts here:
#include <...> search starts here:
 /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl
 /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/compat
 /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1
 /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/lib/clang/22/include
 /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/include
End of search list.
 /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/clang --version
 /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/wasm-ld -o temp.wasm /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/test_neon_wasm_simd_0.o -L/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten -L/Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/src/lib -lGL-getprocaddr -lal -lhtml5 -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/tmphto8ux8elibemscripten_js_symbols.so --strip-debug --export=_emscripten_stack_alloc --export=__wasm_call_ctors --export=emscripten_stack_get_current --export=_emscripten_stack_restore --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=main --export-if-defined=__main_argc_argv --export-table -z stack-size=65536 --no-growable-memory --initial-heap=16777216 --no-entry --table-base=1 --global-base=1024
 /Users/jjasmine/Developer/igalia/llvm-project/build.wasm/bin/llvm-objcopy temp.wasm temp.wasm '--remove-section=.debug*' --remove-section=producers --remove-section=name
 /Users/jjasmine/Developer/igalia/emsdk/node/22.16.0_64bit/bin/node /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/tools/compiler.mjs -
 /Users/jjasmine/Developer/igalia/emsdk/upstream/bin/wasm-opt --strip-target-features --post-emscripten -O2 --low-memory-unused --zero-filled-memory --pass-arg=directize-initial-contents-immutable temp.wasm -o temp.wasm --mvp-features --enable-bulk-memory --enable-bulk-memory-opt --enable-call-indirect-overlong --enable-multivalue --enable-mutable-globals --enable-nontrapping-float-to-int --enable-reference-types --enable-sign-ext --enable-simd
 /Users/jjasmine/Developer/igalia/emsdk/upstream/bin/wasm-opt --strip-target-features --post-emscripten -O2 --low-memory-unused --zero-filled-memory --pass-arg=directize-initial-contents-immutable temp.wasm -o temp.wasm --mvp-features --enable-bulk-memory --enable-bulk-memory-opt --enable-call-indirect-overlong --enable-multivalue --enable-mutable-globals --enable-nontrapping-float-to-int --enable-reference-types --enable-sign-ext --enable-simd
 /Users/jjasmine/Developer/igalia/emsdk/node/22.16.0_64bit/bin/node /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/tools/acorn-optimizer.mjs /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/temp.js JSDCE --minify-whitespace -o /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/temp.jso1.js
 /Users/jjasmine/Developer/igalia/emsdk/node/22.16.0_64bit/bin/node /Users/jjasmine/Developer/igalia/emsdk/upstream/emscripten/tools/acorn-optimizer.mjs /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/temp.js JSDCE --minify-whitespace -o /var/folders/c0/kq37c8513t97ry4__zw2x5f40000gn/T/emscripten_temp_fwfndkwi/temp.jso1.js
Testing NEON Wasm SIMD
At 0
Byte of r_  : 8A64C799 484D47E1 9BBF3942 B38F111F
Byte of test: 8A64C799 484D47E1 9BBF3942 B38F111F
Memcmp result: 0

At 1
Byte of r_  : DA07F2F6 7D20533A AC17DF8C 945FA5F0
Byte of test: DA07F2F6 7D20533A AC17DF8C 945FA5F0
Memcmp result: 0

At 2
Byte of r_  : E6FA39AC 1A631C8C EBC479FA 277E2320
Byte of test: E6FA39AC 1A631C8C EBC479FA 277E2320
Memcmp result: 0

At 3
Byte of r_  : 3A5C20AD 3452BE3A 591F1838 187E9D39
Byte of test: 3A5C20AD 3452BE3A 591F1838 187E9D39
Memcmp result: 0

At 4
Byte of r_  : 49985D0F 567EEA1C 3BAD9E00 F9542D3A
Byte of test: 49985D0F 567EEA1C 3BAD9E00 F9542D3A
Memcmp result: 0

At 5
Byte of r_  : AD084A90 36028635 5E70B023 1556C5DC
Byte of test: AD084A90 36028635 5E70B023 1556C5DC
Memcmp result: 0

At 6
Byte of r_  : 570C5B21 48E0DE0 9961FDBD AEAEB8C2
Byte of test: 570C5B21 48E0DE0 9961FDBD AEAEB8C2
Memcmp result: 0

At 7
Byte of r_  : 2BE4B64B 812F71C3 331A906F C0E1C947
Byte of test: 2BE4B64B 812F71C3 331A906F C0E1C947
Memcmp result: 0

Success!

DL, MVT::i32),
Cmp});

return DAG.getSetCC(DL, VT, Intr, DAG.getConstant(0, DL, MVT::i32), CC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're accidentally negating the result, this should be

Suggested change
return DAG.getSetCC(DL, VT, Intr, DAG.getConstant(0, DL, MVT::i32), CC);
return DAG.getSetCC(DL, VT, Intr, DAG.getConstant(0, DL, MVT::i32), ISD::SETNE);

I should have caught this in review earlier, sorry! You should open up another PR that reverts the revert and include this fix in it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can confirm this doesn't trigger the assertion on the neon test, ty for the keen eyes Luke!

badumbatish added a commit to badumbatish/llvm-project that referenced this pull request Aug 14, 2025
badumbatish added a commit that referenced this pull request Aug 15, 2025
…CC (#153703)

This PR reapplies #149461

In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.

For example, with
```llvm
 %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
  %res = icmp eq i32 %cmp_16, 0
```

the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.

Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 15, 2025
…bine of SETCC (#153703)

This PR reapplies llvm/llvm-project#149461

In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.

For example, with
```llvm
 %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
  %res = icmp eq i32 %cmp_16, 0
```

the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.

Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[WebAssembly] Teach backend that loadv128 is good under -msimd
4 participants